Skip to main content

Retrieval Augmented Generation - RAG

Retrieval Augmented Generation augments the AIFS Foundation models with knowledge from outside its training data, such as proprietary product information or documents provided by the users. AIFS-RAG automatically parses and chunks the documents, creates and stores the embeddings, and uses vector search to retrieve relevant content to answer user queries.


In this example, we'll create an assistant that can help answer questions about Open Telekom Cloud (OTC).

Step 1: Create a new session for the user Create a new session for user with some unique session identifier.

import requests

url = "http://localhost:8000/v3/session"

headers = {
'accept': 'application/json',
'Content-Type': 'application/json',

json_data = {
'userId': 'test_1',
'sessionId': 'session_1',

response =, headers=headers, json=json_data)

You could use any identity manager to authenticate user anget userID.
For better management, generate sessionID for each session create request.

Step 2: Upload files and add them to the session context

# Upload File
headers = {
'accept': 'application/json',
# requests won't add a boundary if this header is set when you pass files=
# 'Content-Type': 'multipart/form-data',

files = {
'file': ('Service.pdf', open('Service.pdf', 'rb'), 'application/pdf'),

response =, headers=headers, files=files)

# Add file to context / indexing file
headers = {
'accept': 'application/json',
'Content-Type': 'application/json',

json_data = {
'userId': 'test_1',
'sessionId': 'session_1',
'files': [
'name': 'Service.pdf',
'metadata': {
'Author': 'Open Telekom Cloud',

response =, headers=headers, json=json_data)

Step 3: Update user configuration to set model and RAG parameters

headers = {
'accept': 'application/json',
'Content-Type': 'application/json',

json_data = {
'userId': 'test_1',
'sessionId': 'session_1',
'modelSettings': {
'chatModel': 'Llama-2',

response =, headers=headers, json=json_data)

Step 4: Create a chat streaming run and check output

import requests

headers = {
'accept': 'application/json',
'Content-Type': 'application/json',

json_data = {
'userId': 'test_1',
'sessionId': 'session_1',
'prompt': 'What is OTC?',

response =, headers=headers, json=json_data)

How it works

The AIFS-RAG tool implements several retrieval best practices out of the box to help you extract the right data from your files and augment the model's responses. The AIFS-RAG tool:

  • Runs semantic searches across both assistant and vector stores.
  • Filters results based on similarity threshold and chunk similarity top k.

By default, the AIFS-RAG tool uses the following settings:

  • Chunk size: 1000 sentences
  • Chunk overlap: 20 sentences
  • Embedding model: text-embedding-ada-002 at 1536 dimensions
  • Maximum number of chunks added to context: 5 (could be more)

Supported File Formats


© Deutsche Telekom AG